Parallel Clustered Low-Rank Approximation of Graphs and Its Application to Link Prediction

نویسندگان

  • Xin Sui
  • Tsung-Hsien Lee
  • Joyce Jiyoung Whang
  • Berkant Savas
  • Saral Jain
  • Keshav Pingali
  • Inderjit S. Dhillon
چکیده

Social network analysis has become a major research area that has impact in diverse applications ranging from search engines to product recommendation systems. A major problem in implementing social network analysis algorithms is the sheer size of many social networks, for example, the Facebook graph has more than 900 million vertices and even small networks may have tens of millions of vertices. One solution to dealing with these large graphs is dimensionality reduction using spectral or SVD analysis of the adjacency matrix of the network, but these global techniques do not necessarily take into account local structures or clusters of the network that are critical in network analysis. A more promising approach is clustered low-rank approximation: instead of computing a global low-rank approximation, the adjacency matrix is first clustered, and then a low-rank approximation of each cluster (i.e., diagonal block) is computed. The resulting algorithm is challenging to parallelize not only because of the large size of the data sets in social network analysis, but also because it requires computing with very diverse data structures ranging from extremely sparse matrices to dense matrices. In this paper, we describe the first parallel implementation of a clustered low-rank approximation algorithm for large social network graphs, and use it to perform link prediction in parallel. Experimental results show that this implementation scales well on large distributed-memory machines; for example, on a Twitter graph with roughly 11 million vertices and 63 million edges, our implementation scales by a factor of 86 on 128 processes and takes less than 2300 seconds, while on a much larger Twitter graph with 41 million vertices and 1.2 billion edges, our implementation scales by a factor of 203 on 256 processes with a running time about 4800 seconds.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Graph Spectra Regression with Low-Rank Approximation for Dynamic Graph Link Prediction∗

We propose a temporal regression model for dynamic graph link prediction problem, under spectral graph theory and low rank approximation for the graph Laplacian matrix. Link prediction is important in large-scale graphs including social networks, biological networks, power grid, etc. Most of these graphs have different characteristics such as degree distribution, due to their underlying samplin...

متن کامل

Clustered low rank approximation of graphs in information science applications

In this paper we present a fast and accurate procedure called clustered low rank matrix approximation for massive graphs. The procedure involves a fast clustering of the graph and then approximates each cluster separately using existing methods, e.g. the singular value decomposition, or stochastic algorithms. The cluster-wise approximations are then extended to approximate the entire graph. Thi...

متن کامل

Fast and Accurate Low Rank Approximation of Massive Graphs

In this paper we present a fast and accurate procedure called clustered low rank matrix approximation for massive graphs. The procedure involves a fast clustering of the graph and then approximating each cluster separately using existing methods, e.g. the singular value decomposition, or stochastic algorithms. The cluster-wise approximations are then extended to approximate the entire graph. Th...

متن کامل

Log-Normal Matrix Completion for Large Scale Link Prediction

The ubiquitous proliferation of online social networks has led to the widescale emergence of relational graphs expressing unique patterns in link formation and descriptive user node features. Matrix Factorization and Completion have become popular methods for Link Prediction due to the low rank nature of mutual node friendship information, and the availability of parallel computer architectures...

متن کامل

Hierarchical Hyperlink Prediction for the WWW

The hyperlink prediction task, that of proposing new links between webpages, can be used to improve search engines, expand the visibility of web pages, and increase the connectivity and navigability of the web. Hyperlink prediction is typically performed on webgraphs composed by thousands or millions of vertices, where on average each webpage contains less than fifty links. Algorithms processin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012